Search CORE

15 research outputs found

Power, Reliability, Performance: One System to Rule Them All

Author: Acun Bilge
Kalé Laxmikant
Langer Akhil
Meneses-Rojas Esteban
Menon Harshitha
Sarood Osman
Totoni Ehsan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

En un diseño basado en el marco de programación paralelo Charm ++, un sistema de tiempo de ejecución adaptativo interactúa dinámicamente con el administrador de recursos de un centro de datos para controlar la energía mediante la programación inteligente de trabajos, la reasignación de recursos y la reconfiguración de hardware. Gestiona simultáneamente la fiabilidad al enfriar el sistema al nivel óptimo de la aplicación en ejecución y mantiene el rendimiento a través del equilibrio de carg

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Carbon Responder: Coordinating Demand Response for the Datacenter Fleet

Author: Acun Bilge
Avila Nikky
Brooks David
Chakkaravarthy Manoj
Lee Benjamin C.
Sundarrajan Aditya
Wu Carole-Jean
Xing Jiali
Publication venue
Publication date: 14/11/2023
Field of study

The increasing integration of renewable energy sources results in fluctuations in carbon intensity throughout the day. To mitigate their carbon footprint, datacenters can implement demand response (DR) by adjusting their load based on grid signals. However, this presents challenges for private datacenters with diverse workloads and services. One of the key challenges is efficiently and fairly allocating power curtailment across different workloads. In response to these challenges, we propose the Carbon Responder framework. The Carbon Responder framework aims to reduce the carbon footprint of heterogeneous workloads in datacenters by modulating their power usage. Unlike previous studies, Carbon Responder considers both online and batch workloads with different service level objectives and develops accurate performance models to achieve performance-aware power allocation. The framework supports three alternative policies: Efficient DR, Fair and Centralized DR, and Fair and Decentralized DR. We evaluate Carbon Responder polices using production workload traces from a private hyperscale datacenter. Our experimental results demonstrate that the efficient Carbon Responder policy reduces the carbon footprint by around 2x as much compared to baseline approaches adapted from existing methods. The fair Carbon Responder policies distribute the performance penalties and carbon reduction responsibility fairly among workloads

arXiv.org e-Print Archive

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Author: Acun Bilge
Ardalani Newsha
Brooks David
DeVito Zachary
Golden Alicia
Hsia Samuel
Wei Gu-Yeon
Wu Carole-Jean
Publication venue
Publication date: 18/10/2023
Field of study

Training and deploying large machine learning (ML) models is time-consuming and requires significant distributed computing infrastructures. Based on real-world large model training on datacenter-scale infrastructures, we show 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize the outstanding communication latency, in this work, we develop an agile performance modeling framework to guide parallelization and hardware-software co-design strategies. Using the suite of real-world large ML models on state-of-the-art GPU training hardware, we demonstrate 2.24x and 5.27x throughput improvement potential for pre-training and inference scenarios, respectively

arXiv.org e-Print Archive

Parallel Programming with Migratable Objects: Charm++ in Practice

Author: Acun Bilge
Gupta Abhishek
Jain Nikhil
Kale Laxmikant
Langer Akhil
Menon Harshitha
Mikida Eric
Ni Xiang
Robson Michael
Sun Yanhua
Totoni Ehsan
Wesolowski Lukasz
Publication venue: Smith ScholarWorks
Publication date: 16/01/2014
Field of study

The advent of petascale computing has introduced new challenges (e.g. Heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many miniapplications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede

Smith College: Smith ScholarWorks

SecNDP: Secure Near-Data Processing with Untrusted Memory

Author: Bilge Acun
Carole-Jean Wu
Dimitrije Jankov
Eric Northup
G. Edward Suh
Hsien-Hsin S. Lee.
Jie Amy Yang
Liu Ke
Michael Kounavis
Ping Tak Peter Tang
Wenjie Xiong
Xiaochen Wang
Xuan Zhang
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/12/2021
Field of study

Today\u27s data-intensive applications increasingly suffer from significant performance bottlenecks due to the limited memory bandwidth of the classical von Neumann architecture. Near-Data Processing (NDP) has been proposed to perform computation near memory or data storage to reduce data movement for improving performance and energy consumption. However, the untrusted NDP processing units (PUs) bring in new threats to workloads that are private and sensitive, such as private database queries and private machine learning inferences. Meanwhile, most existing secure hardware designs do not consider off-chip components trustworthy. Once data leaving the processor, they must be protected, e.g., via block cipher encryption. Unfortunately, current encryption schemes do not support computation over encrypted data stored in memory or storage, hindering the adoption of NDP techniques for sensitive workloads. In this paper, we propose SecNDP, a lightweight encryption and verification scheme for untrusted NDP devices to perform computation over ciphertext and verify the correctness of linear operations. Our encryption scheme leverages arithmetic secret sharing in secure Multi-Party Computation (MPC) to support operations over ciphertext, and uses counter-mode encryption to reduce the decryption latency. The security of the scheme is formally proven. Compared with a non-NDP baseline, secure computation with SecNDP significantly reduces the memory bandwidth usage while providing security guarantees. We evaluate SecNDP for two workloads of distinct memory access patterns. In the setting of eight NDP units, we show a speedup up to 7.46x and energy savings of 18% over an unprotected non-NDP baseline, approaching the performance gain attained by native NDP without protection.Furthermore, SecNDP does not require any security assumption on NDP to hold, thus, using the same threat model as existing secure processors. SecNDP can be implemented without changing the NDP protocols and their inherent hardware design

Cryptology ePrint Archive

DataPerf: Benchmarks for Data-Centric AI Development

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing dataset benchmarks. In response, we present DataPerf, a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We aim to foster innovation in data-centric AI through competition, comparability, and reproducibility. We enable the ML community to iterate on datasets, instead of just architectures, and we provide an open, online platform with multiple rounds of challenges to support this iterative development. The first iteration of DataPerf contains five benchmarks covering a wide spectrum of data-centric techniques, tasks, and modalities in vision, speech, acquisition, debugging, and diffusion prompting, and we support hosting new contributed benchmarks from the community. The benchmarks, online evaluation platform, and baseline implementations are open source, and the MLCommons Association will maintain DataPerf to ensure long-term benefits to academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac

arXiv.org e-Print Archive

Mitigating variability in HPC systems and applications for performance and power efficiency

Author: Acun Bilge
Publication venue
Publication date: 01/12/2017
Field of study

Power consumption and process variability are two important, interconnected, challenges of future generation large-scale High Performance Computing (HPC) data centers. For example, current production petaflop supercomputers consume more than 10 megawatts of machine and cooling power that costs millions of dollars every year. As HPC moves towards exascale computing, these costs will increase and power consumption is expected to become a major concern. Not solely dynamic behavior of HPC applications but also dynamic behavior of HPC systems makes it challenging to optimize the performance and power efficiency of large scale applications. Dynamic behavior of applications include irregular or imbalanced applications. Dynamic behavior of HPC systems include thermal, power, and frequency variations among processors. Smart and adaptive runtime systems have great potential to handle these challenges transparently from the application. In this dissertation, I first analyze frequency, temperature, and power variations in large- scale HPC systems using thousands of cores and different applications. After I identify the cause of each of these variations, I propose solutions to mitigate these variations to improve performance and power efficiency. When analyzing frequency variation, I attribute manufacturing related intrinsic differences in the chips’ power efficiency as the culprit behind frequency variation under dynamic overclocking. I propose speed-aware dynamic load balancing strategies to mitigate the performance overhead due to frequency variation. When analyzing temperature variation, I focus on inefficiencies in fan-based air cooling systems. I propose proactive and decoupled fan control mechanisms that reduce temperature variations and reduce cooling power consumption by predicting core temperatures using a learning based model. When analyzing power variations, I identify manufacturing related sources of power variation that are static and dynamic. I propose different variation aware node assembly methods to mitigate the power variation. Finally, I propose a fine-grained runtime based technique to mitigate application level variations that are caused by the characteristics of the application itself (for example, applications with different kernel types or phases) in order to reduce the energy consumption

Illinois Digital Environment for Access to Learning and Scholarship Repository

Mitigating variability in HPC systems and applications for performance and power efficiency

Author: Acun Bilge
Publication venue
Publication date
Field of study